We have received DAMD data, let's explore it as a graph.
In [ ]:
import pandas as pd
If you place the data file in the same directory as this Notebook, it can be read into Python.
In [ ]:
damd = pd.read_csv("20170718 hashtag_damd uncleaned.csv")
damd.head(3)
Describe in words what is the shape of the data? What is an item of data? What do we know about each of the items? Do you already have an idea how would you like to start analyzing such data?
To explore the DAMD data, let's conceptualize how rows are related to one another? Let's imagine a graph. Don't hesitate to grab pen+paper or the whiteboards.
We can use Table 2 Net to build such a graph. The tool will give us a .gexf
graph file. Build a bipartite graph, with tweet_id
and hashtags
as the two types of nodes, separating the latter by ;
. Open .gexf
file with you browser, what does it look like? Is it different shape that the .csv
file?
Ooh it so happens, that ETHOS Lab has a little code thing to turn a matrix to a graph. Please take a look. If you copypaste the buildHashtagCooccurrenceGraph
function definition below and have run the code earlier in this notebook, you can create the graph in Python.
In [ ]:
import networkx as nx
# copy+paste the function definition below
def buildHashtagCooccurrenceGraph(tweets):
g = ....
.
.
return g
In [ ]:
damd_graph = buildHashtagCooccurrenceGraph(damd)
print(nx.info(damd_graph))
In [ ]:
nx.write_gexf(damd_graph, "damd_graph.gexf")
Here is an example visualization of DAMD data. It is a hashtag cooccurrence graph, with red tweets and green hashtags. Top hashtags labels are shown.
The central node, the one for hashtag #damd
has been removed. Can you tell why?
The process for producing the above visualization, in Gephi, was approximately as follows:
Type
What qualitative questions come up when you explored the graphs? What quantitative questions came up?
Think about how is what you did today answering what John is trying to research. How would you tweak it (in your mind?). How would you sketch it? How would your tweaks look like? Tweaks are conceptual, not code